A POMDP Extension with Belief-dependent Rewards

نویسندگان

Mauricio Araya-López

Olivier Buffet

Vincent Thomas

François Charpillet

چکیده

Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Potential-Based Reward Shaping for POMDPs (Extended Abstract)

We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized ...

متن کامل

Efficient Decision-Theoretic Target Localization

Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information...

متن کامل

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

Partially Observable Markov Decision Processes (POMDP) provide a standard framework for sequential decision making in stochastic environments. In this setting, an agent takes actions and receives observations and rewards from the environment. Many POMDP solution methods are based on computing a belief state, which is a probability distribution over possible states in which the agent could be. T...

متن کامل

Evaluating POMDP rewards for active perception

One popular approach to active perception is using POMDPs to maximize rewards received for sensing actions towards task accomplishment and/or continually refining the agent’s knowledge. Multiple types of reward functions have been proposed to achieve these goals: (1) state-based rewards which minimize sensing costs and maximize task rewards, (2) belief-based rewards which maximize belief state ...

متن کامل

Optimization of Prostate Biopsy Referral Decisions

Prostate cancer is the most common solid tumor in American men and is screened for using prostate-specific antigen (PSA) tests. We report on a non-stationary partially observable Markov decision process (POMDP) for prostate biopsy referral decisions. The core states are the patients’ prostate cancer related health states, and PSA test results are the observations. Transition probabilities and r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

A POMDP Extension with Belief-dependent Rewards

نویسندگان

چکیده

منابع مشابه

Potential-Based Reward Shaping for POMDPs (Extended Abstract)

Efficient Decision-Theoretic Target Localization

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

Evaluating POMDP rewards for active perception

Optimization of Prostate Biopsy Referral Decisions

عنوان ژورنال:

اشتراک گذاری